Goto

Collaborating Authors

 physical environment


Supplementary Material

Neural Information Processing Systems

Tab. 13 shows the parameters and variables used in this optimization. Table 13: Parameters and variables used in credit optimization.Known Parameters Description ϱ = R Eq. 5 presents the optimization formulation, where Eq. 5a calculates the total credits gained by the The following examples illustrate the prompts used in LLM-C for each mini-game. The prompts vary slightly for different mini-games and also differ across stages within the same mini-game. Specifically, the prompt for the dynamic scenario in Social Structure is presented in Listing 1. The corresponding prompts are provided in Listing 4 and Listing 5. 27 Listing 1: Prompt example for dynamic scenario in Social Structure . Instructions: - The AdaSociety game is an open-ended multi-agent environment. The game consists ofa complex crafting tree, where the agent needs to obtain as many resources aspossible in the limited time and craft tools to mine more advanced resources tomaximize its benefit. At the same time, agents can also take other actions tohelp them increase their returns. The numbers of resources are limited.- Map: AdaSociety is a 2D grid-world game. The map size is 15*15.- Some of them can only bediscovered with some specific tools, which will be introduced next.-


Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

Shen, Xinjie, Li, Mufei, Li, Pan

arXiv.org Artificial Intelligence

The deployment of Large Language Models (LLMs) in embodied agents creates an urgent need to measure their privacy awareness in the physical world. Existing evaluation methods, however, are confined to natural language based scenarios. To bridge this gap, we introduce EAPrivacy, a comprehensive evaluation benchmark designed to quantify the physical-world privacy awareness of LLM-powered agents. EAPrivacy utilizes procedurally generated scenarios across four tiers to test an agent's ability to handle sensitive objects, adapt to changing environments, balance task execution with privacy constraints, and resolve conflicts with social norms. Our measurements reveal a critical deficit in current models. The top-performing model, Gemini 2.5 Pro, achieved only 59\% accuracy in scenarios involving changing physical environments. Furthermore, when a task was accompanied by a privacy request, models prioritized completion over the constraint in up to 86\% of cases. In high-stakes situations pitting privacy against critical social norms, leading models like GPT-4o and Claude-3.5-haiku disregarded the social norm over 15\% of the time. These findings, demonstrated by our benchmark, underscore a fundamental misalignment in LLMs regarding physically grounded privacy and establish the need for more robust, physically-aware alignment. Codes and datasets will be available at https://github.com/Graph-COM/EAPrivacy.


Supplementary Material

Neural Information Processing Systems

Tab. 13 shows the parameters and variables used in this optimization. Table 13: Parameters and variables used in credit optimization.Known Parameters Description ϱ = R Eq. 5 presents the optimization formulation, where Eq. 5a calculates the total credits gained by the The following examples illustrate the prompts used in LLM-C for each mini-game. The prompts vary slightly for different mini-games and also differ across stages within the same mini-game. Specifically, the prompt for the dynamic scenario in Social Structure is presented in Listing 1. The corresponding prompts are provided in Listing 4 and Listing 5. 27 Listing 1: Prompt example for dynamic scenario in Social Structure . Instructions: - The AdaSociety game is an open-ended multi-agent environment. The game consists ofa complex crafting tree, where the agent needs to obtain as many resources aspossible in the limited time and craft tools to mine more advanced resources tomaximize its benefit. At the same time, agents can also take other actions tohelp them increase their returns. The numbers of resources are limited.- Map: AdaSociety is a 2D grid-world game. The map size is 15*15.- Some of them can only bediscovered with some specific tools, which will be introduced next.-


Synchron's Brain-Computer Interface Now Has Nvidia's AI

WIRED

Neurotech company Synchron has unveiled the latest version of its brain-computer interface, which uses Nvidia technology and the Apple Vision Pro to enable individuals with paralysis to control digital and physical environments with their thoughts. In a video demonstration at the Nvidia GTC conference this week in San Jose, California, Synchron showed off how its system allows one of its trial participants, Rodney Gorham, who is paralyzed, to control multiple devices in his home. From his sun-filled living room in Melbourne, Australia, Gorham is able to play music from a smart speaker, adjust the lighting, turn on a fan, activate an automatic pet feeder, and run a robotic vacuum. Gorham has lost the use of his voice and much of his body due to having amyotrophic lateral sclerosis, or ALS. The degenerative disease weakens muscles over time and eventually leads to paralysis.


AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making

Huang, Yizhe, Wang, Xingbo, Liu, Hao, Kong, Fanqi, Qin, Aoyang, Tang, Min, Zhu, Song-Chun, Bi, Mingjie, Qi, Siyuan, Feng, Xue

arXiv.org Artificial Intelligence

Traditional interactive environments limit agents' intelligence growth with fixed tasks. Recently, single-agent environments address this by generating new tasks based on agent actions, enhancing task diversity. We consider the decision-making problem in multi-agent settings, where tasks are further influenced by social connections, affecting rewards and information access. However, existing multi-agent environments lack a combination of adaptive physical surroundings and social connections, hindering the learning of intelligent behaviors. To address this, we introduce AdaSociety, a customizable multi-agent environment featuring expanding state and action spaces, alongside explicit and alterable social structures. As agents progress, the environment adaptively generates new tasks with social structures for agents to undertake. In AdaSociety, we develop three mini-games showcasing distinct social structures and tasks. Initial results demonstrate that specific social structures can promote both individual and collective benefits, though current reinforcement learning and LLM-based algorithms show limited effectiveness in leveraging social structures to enhance performance. Overall, AdaSociety serves as a valuable research platform for exploring intelligence in diverse physical and social settings.


MegaCOIN: Enhancing Medium-Grained Color Perception for Vision-Language Models

Chiu, Ming-Chang, Wen, Shicheng, Chen, Pin-Yu, Ma, Xuezhe

arXiv.org Artificial Intelligence

In vision-language models (VLMs), the ability to perceive and interpret color and physical environment is crucial for achieving contextually accurate understanding and interaction. However, despite advances in multimodal modeling, there remains a significant lack of specialized datasets that rigorously evaluate a model's capacity to discern subtle color variations and spatial context -- critical elements for situational comprehension and reliable deployment across real-world applications. Toward that goal, we curate MegaCOIN, a high-quality, human-labeled dataset based on \emph{real} images with various contextual attributes. MegaCOIN consists of two parts: MegaCOIN-Instruct, which serves as a supervised fine-tuning (SFT) dataset for VLMs; and MegaCOIN-Bench, an annotated test set that can be used as a stand-alone QA dataset. MegaCOIN~provides three annotated features for 220,000 real images: foreground color, background color, and description of an object's physical environment, constituting 660k human annotations. In addition, MegaCOIN can be applied to benchmark domain generalization (DG) algorithms. We explore benchmarking DG methods in the linear probing setup for VLM and show some new insights. Last but not least, we show that VLMs, including GPT-4o, have subpar color recognition capabilities, and fine-tuning with MegaCOIN can result in improved performance on visual evaluation tasks. In certain cases, MegaCOIN fine-tuned small-scale opensource models such as LLaVA and Bunny can outperform closed-source GPT-4o. We hope the utilities of MegaCOIN can shed light on the directions VLMs can improve and provide a more complex platform for domain generalization algorithms.


Flashy Backdoor: Real-world Environment Backdoor Attack on SNNs with DVS Cameras

Riaño, Roberto, Abad, Gorka, Picek, Stjepan, Urbieta, Aitor

arXiv.org Artificial Intelligence

While security vulnerabilities in traditional Deep Neural Networks (DNNs) have been extensively studied, the susceptibility of Spiking Neural Networks (SNNs) to adversarial attacks remains mostly underexplored. Until now, the mechanisms to inject backdoors into SNN models have been limited to digital scenarios; thus, we present the first evaluation of backdoor attacks in real-world environments. We begin by assessing the applicability of existing digital backdoor attacks and identifying their limitations for deployment in physical environments. To address each of the found limitations, we present three novel backdoor attack methods on SNNs, i.e., Framed, Strobing, and Flashy Backdoor. We also assess the effectiveness of traditional backdoor procedures and defenses adapted for SNNs, such as pruning, fine-tuning, and fine-pruning. The results show that while these procedures and defenses can mitigate some attacks, they often fail against stronger methods like Flashy Backdoor or sacrifice too much clean accuracy, rendering the models unusable. Overall, all our methods can achieve up to a 100% Attack Success Rate while maintaining high clean accuracy in every tested dataset. Additionally, we evaluate the stealthiness of the triggers with commonly used metrics, finding them highly stealthy. Thus, we propose new alternatives more suited for identifying poisoned samples in these scenarios. Our results show that further research is needed to ensure the security of SNN-based systems against backdoor attacks and their safe application in real-world scenarios. The code, experiments, and results are available in our repository.


Multi-View Black-Box Physical Attacks on Infrared Pedestrian Detectors Using Adversarial Infrared Grid

Tiliwalidi, Kalibinuer, Hu, Chengyin, Shi, Weiwen

arXiv.org Artificial Intelligence

While extensive research exists on physical adversarial attacks within the visible spectrum, studies on such techniques in the infrared spectrum are limited. Infrared object detectors are vital in modern technological applications but are susceptible to adversarial attacks, posing significant security threats. Previous studies using physical perturbations like light bulb arrays and aerogels for white-box attacks, or hot and cold patches for black-box attacks, have proven impractical or limited in multi-view support. To address these issues, we propose the Adversarial Infrared Grid (AdvGrid), which models perturbations in a grid format and uses a genetic algorithm for black-box optimization. These perturbations are cyclically applied to various parts of a pedestrian's clothing to facilitate multi-view black-box physical attacks on infrared pedestrian detectors. Extensive experiments validate AdvGrid's effectiveness, stealthiness, and robustness. The method achieves attack success rates of 80.00\% in digital environments and 91.86\% in physical environments, outperforming baseline methods. Additionally, the average attack success rate exceeds 50\% against mainstream detectors, demonstrating AdvGrid's robustness. Our analyses include ablation studies, transfer attacks, and adversarial defenses, confirming the method's superiority.


Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

Ma, Longfei, Cheng, Nan, Wang, Xiucheng, Chen, Jiong, Gao, Yinjun, Zhang, Dongxiao, Zhang, Jun-Jie

arXiv.org Artificial Intelligence

The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, specifically designed to enhance the accuracy and utility of DTs in testing algorithmic performance. We propose a novel construction methodology that integrates deep learning-based policy gradient techniques to dynamically tune the DT parameters, ensuring high fidelity in the digital replication of physical systems. Moreover, the Mean STate Error (MSTE) is proposed as a robust metric for evaluating the performance of algorithms within these digital space. The efficacy of our framework is demonstrated through extensive simulations that show our DT not only accurately mirrors the physical reality but also provides a reliable platform for algorithm evaluation. This work lays a foundation for future research into DT technologies, highlighting pathways for both theoretical enhancements and practical implementations in various industries.


DeepRacer on Physical Track: Parameters Exploration and Performance Evaluation

Koparan, Sinan, Javadi, Bahman

arXiv.org Artificial Intelligence

This paper focuses on the physical racetrack capabilities of AWS DeepRacer. Two separate experiments were conducted. The first experiment (Experiment I) focused on evaluating the impact of hyperparameters on the physical environment. Hyperparameters such as gradient descent batch size and loss type were changed systematically as well as training time settings. The second experiment (Experiment II) focused on exploring AWS DeepRacer object avoidance in the physical environment. It was uncovered that in the simulated environment, models with a higher gradient descent batch size had better performance than models with a lower gradient descent batch size. Alternatively, in the physical environment, a gradient descent batch size of 128 appears to be preferable. It was found that models using the loss type of Huber outperformed models that used the loss type of MSE in both the simulated and physical environments. Finally, object avoidance in the simulated environment appeared to be effective; however, when bringing these models to the physical environment, there was a pronounced challenge to avoid objects. Therefore, object avoidance in the physical environment remains an open challenge.